Parallel WaveNet: Fast High-Fidelity Speech Synthesis

نویسندگان

Aäron van den Oord

Yazhe Li

Igor Babuschkin

Karen Simonyan

Oriol Vinyals

Koray Kavukcuoglu

George van den Driessche

Edward Lockhart

Luis C. Cobo

Florian Stimberg

Norman Casagrande

Dominik Grewe

Seb Noury

Sander Dieleman

Erich Elsen

Nal Kalchbrenner

Heiga Zen

Alex Graves

Helen King

Tom Walters

Dan Belov

Demis Hassabis

چکیده

The recently-developed WaveNet architecture [27] is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-...

متن کامل

Speaker-Dependent WaveNet Vocoder

In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as auxiliary features of WaveNet. It is expected that WaveNet can learn a sample-by-sample correspondence between speech waveform and acoustic features. The advantage of the proposed method is that it does not require (1) exp...

متن کامل

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinio...

متن کامل

Hybridnet: a Hybrid Neural Architecture to Speed-up Autoregressive Models

This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation. As an example, we propose a hybrid model that combines an autoregressive network named WaveNet and a conventional LSTM model to address speech synthesis. Instead of generating one sample per time-step, the proposed HybridNet generates multiple samples per time-step by ex...

متن کامل

Text-to-speech Synthesis System based on Wavenet

In this project, we focus on building a novel parametric TTS system. Our model is based on WaveNet(Oord et al, 2016), a deep neural network introduced by DeepMind in late 2016 for generating raw audio waveforms. It is fully probabilistic, with the predictive distribution for each audio sample conditioned on all previous samples. The model introduces the idea of convolutional layer into TTS task...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1711.10433 شماره

صفحات -

تاریخ انتشار 2017

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

نویسندگان

چکیده

منابع مشابه

WaveNet: A Generative Model for Raw Audio

Speaker-Dependent WaveNet Vocoder

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Hybridnet: a Hybrid Neural Architecture to Speed-up Autoregressive Models

Text-to-speech Synthesis System based on Wavenet

عنوان ژورنال:

اشتراک گذاری